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Abstract 

We consider a continuous-time stochastic volatility model. The 
model contains a stationary volatility process, the multivariate density 
of the finite dimensional distributions of which we aim to estimate. We 
assume that we observe the process at discrete instants in time. The 
sampling times will be equidistant with vanishing distance. 

A multivariate Fourier-type deconvolution kernel density estimator 
based on the logarithm of the squared processes is proposed to esti- 
mate the multivariate volatility density. An expansion of the bias and 
a bound on the variance are derived. 
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1 Introduction 

Let S denote the log price process of some stock in a financial market. It 
is often assumed that S can be modelled as the solution of a stochastic 
differential equation or, more general, as an Ito diffusion process. So we 
assume that we can write 

dS t = b t dt + a t dW t , S = 0, (1) 

or, in integral form, 

S t = [ b s ds+ [ a s dW s , (2) 
Jo Jo 
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where W is a standard Brownian motion and the processes b and a are 
assumed to satisfy certain regularity conditions (see Karatzas and Shreve 
(1991)) to have the integrals in ([2]) well-defined. In a financial context, the 
process a is called the volatility process. One usually takes the process a 
independent of the Brownian motion W. 

In this paper we adopt this independence assumption and we model 
a as a strictly stationary positive process satisfying a mixing condition, 
for example an ergodic diffusion on (0, oo). We will assume that all p- 
dimensional marginal distributions of a have invariant densities with respect 
to the Lebesgue measure on (0,oo) p . This is typically the case in virtually 
all stochastic volatility models that are proposed in the literature, where the 
evolution of a is modelled by a stochastic differential equation, mostly in 
terms of a 2 , or logo -2 (cf. e.g. Wiggins (1987), Heston (1993)). 

As a motivation for nonparametric estimation procedures we consider 
differential equations of the type 



with B equal to Brownian motion. Focussing on the invariant univariate 
density, we recall that it is up to a multiplicative constant equal to 



where xq is an arbitrary element of the state space (I, r), see e.g. Gihman 
and Skorohod (1972) or Skorokhod (1989). From formula ([3]) one sees that 
the invariant distribution of the volatility process (take X for instance equal 
to a 2 or logo" 2 ) may take on many different forms, as is the case for the 
various models that have been proposed in the literature. Refraining from 
parametric assumptions on the functions a and b, nonparametric statistical 
procedures may be used to obtain information about the shape of the (one 
dimensional) invariant distribution. 

A phenomenon that is often observed in practice, is volatility clustering. 
This means that for different time instants ti,...,t p that are close, the 
corresponding values of a tl , . . . , at p are close again. This can partly be 
explained by assumed continuity of the process a, but it might also result 
from specific areas where the multivariate density of (cr^, . . . ,(Jt p ) assumes 
high values. For instance, it is conceivable that for p = 2, the density of 
(cr^, Ot 2 ) has high concentrations around points (£, I) and (h, h), with £ < h, 
a kind of bimodality on the diagonal of the joint distribution, with the 
interpretation that clustering occurs around a low value t or around a high 
value h. 

Here is an example where this happens. We consider a regime switching 
volatility process. Assume that for i = 0, 1 we have two stationary processes 
X 1 , each of them having multivariate invariant distributions having densi- 
ties. Call these ft 1} ... t t i x i^ ■ ■ ■ > x p)i whereas for p = 1 we simply write /\ 




dX t = b{X t ) dt + a(X t ) dB t 




(3) 
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We assume these two processes to be independent, and also independent of 
a two-state homogeneous Markov chain U with states 0, 1. Let Q(t) be the 
matrix of transition probabilities qij(t) = P(Xt = i\Xq = j). Let A be the 
matrix of transition intensities and write 



The stationary distribution of U is given by 7Tj := P(Ut = i) = ^+ x ai anc ^ 
we assume that Uq has this distribution. We finally define the process £ by 



Then £ is stationary too and it has a bivariate stationary distribution with 
a density, related by P(£ s £ dx,£t G dy) = f St t(x, y) dx dy. Elementary 
calculations lead to the following expression for / S;i for < s < t. 



Suppose that the volatility process is defined by at = exp(£ t ) and that the 
X 1 are both Ornstein-Uhlenbeck processes given by 



with W 1 , W 2 independent Brownian motions, /xi / fi2 and a > 0. Sup- 
pose that the X % start in their stationary N(fii, |^) distributions. Then the 
centre of the distribution of (XI, XI) is whereas the centre of the 

distribution of (Xj?,^ 1 ) is (/io,Mi). Hence the density / S; t is a mixture of 
four hump shaped contours, each of them having a different centre of loca- 
tion. If t — s is small, this effectively reduces to mixture of distributions with 
centres and (^2,^2)- 

Nonparametric procedures are able to detect such a property of a bi- 
variate distribution, and are consequently by all means sensible tools to get 
some partial insight in the behaviour of the volatility. 

In the present paper we propose a nonparametric estimator for the mul- 
tivariate density of the volatility process. Using ideas from deconvolution 
theory, we will propose a procedure for the estimation of this density at a 
number of fixed time instants. Related work on estimating a univariate den- 
sity has been done by Van Es et al. (2003), Comte and Genon-Catalot (2006), 
Van Zanten and Zareba (2008), whereas a deconvolation approach has also 




with a ,ai > 0. Then Q(t) = AQ(t), and 




& = u t x} + (1 - u t )xl 



fs,t{x, y) = qu(t - s)7ri/ Sjt (x, y) + q w {t - s)7r / (x)/ (y) 
+ q 01 {t - s)Tr 1 f 1 {x)f°(y) + q 00 (t - s)7r / s t (x,y). 



dX\ = -a{X\ - dt + bdW t l , 
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been adopted to estimate a regression function for a discrete time stochas- 
tic volatility model by Pranke et al. (2003), Comte (2004) and Comte et 
al. (2008). 

The observations of log-asset price S process are assumed to take place 
at the time instants A, 2A, . . . , nA, where the time gap satisfies A = A„ — > 
and nA n — > oo as n — > oo. This means that we base our estimator on so 
called high frequency data. 

To asses the quality of our procedure, we will study how the bias and 
variance of the estimator behave under these assumptions. In Van Es et 
al. (2003) this problem has been studied for the marginal univariate density 
of a. The multivariate study of the present paper largely builds on the ap- 
proach of the cited paper, in particular we will rely on a number of technical 
results that are contained in it, but also we will borrow ideas from Van Es et 
al. (2005), where a multivariate problem for discrete time models has been 
studied. Nevertheless, we will encounter a number of technical problems 
that are not present in the univariate case, nor in the multivariate case for 
discrete time models. 

The remainder of the paper is organized as follows. In the next Sec- 
tion [21 we give the heuristic arguments that motivate the definition of our 
estimator. In Section the main results concerning the asymptotic be- 
haviour of the estimator are presented and discussed. The proofs of the 
main theorems are given in Section [H They are based on a number of 
technical lemmas, whose proofs are collected in Section 

2 Construction of the estimator 

To motivate the construction of the estimator, we first consider (pQ) without 
the drift term, so we assume to have 



It is assumed that we observe the process S at the discrete time instants 0, 
A, 2A, . . . , nA. For i = 1, 2, ... we work, as in Genon-Catalot et al. (1998, 
1999), with the normalized increments 



dS t = a t dW t , S = 0. 



%i — -4r(SiA - <%-i)a) 



i 



1)A 



at dW t . 



(4) 



For small A, we have the rough approximation 




0-(i-l)A^(WiA - W(<_i)a) 



f(i-l)A-2i > 



(5) 



where for i = 1, 2 



. we define 
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By the independence and stationarity of Brownian increments, the sequence 
Z^, Z^i ... is an i.i.d. sequence of standard normal random variables. More- 
over, the sequence is independent of the process a by assumption. 

Let us first describe the univariate density estimator. Taking the loga- 
rithm of the square of we get 

log((Xf ) 2 ) « lcg(4_ 1)A ) + log((zA ) 2 ), (6) 

where the terms in the sum are independent. Assuming that the approx- 
imation is sufficiently accurate we can use this approximate convolution 
structure to estimate the unknown density / of log(<r 2 A ) from the observed 
log(pe) 2 ). 

Before we can define the estimator, we need some more notation. Ob- 
serve that the density of the 'noise' log(^ A ) 2 , denoted by k, is given by 

1 1 x 

e2 x e 2 e . (7) 

The characteristic function of the density k is denoted by fa. We 
have fa{t) = -^2 lt T{^ + it) and it's asymptotic expansion \fa{t)\ = 

V^ e -5 W I*I(1 + 0(4r)), for \t\ -► oo, see Van Es et al. (2003). 

We will use a kernel function w, satisfying the following condition. For 
examples of such kernels see Wand (1998). 

Condition 2.1. Let w be a real symmetric function with real valued sym- 
metric characteristic function fa having support [-1,1]. Assume further 

1- \w{u)\du < oo , w(u)du = 1 , u 2 \w(u)\du < oo , 

2. fa(l -t) = At? + o(tf), as t | for some p > 0. 

Following a well-known approach in statistical deconvolution theory, we 
use a deconvolution kernel density estimator, see e.g. Section 6.2.4 of Wand 
and Jones (1995). Having the characteristic functions (j)^ and fa at our 
disposal, choosing a positive bandwidth h, we introduce the kernel function 

and the density estimator of the univariate density / given by 

1 n 

One easily verifies that the function v^, and therefore also the estimator f n h, 
is real-valued. In Van Es et al. (2003) bias expansion and bounds on the 
variance of f n h(x) have been obtained. 




log((Af ^ 



(9) 
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In the present paper we will extend these results to a multivariate 
setting, in which we will estimate the density /(x) = ft 1 ,...,t p ( x ), with x = 
(xi, . . . ,x p ), of a vector log^^ , . . . , log of). Here the < t\ < . . . < t p 
denote p pre-specified time points. Below we use boldface expressions for 
(random) vectors. The expression for the estimator of this density will be 
seen to be analogous to the estimator in the univariate case, that has been 
analyzed in Van Es et al. (2003), and exhibits some similarity with the 
estimator of a similar multivariate density in a discrete time framework as 
treated in Van Es et al. (2005). 

What one ideally needs to estimate /(x) are observations of p- 
dimensional random vectors that all have a density equal to /. This hap- 
pens under the observation scheme that we have introduced previously, if 
the tk are multiples of A, tf. = i^A say. In that case, one should use 
(X^ +jA , . . . , X^ + - A ) for all the values of j that are given by the obser- 
vations. The complicating factor is however, that the are not given as 
multiples of A, which on the other hand would lead to an uninteresting 
estimation problem, if A — > 0. Note also that this kind of problem is not 
present, when one aims at estimating a univariate marginal density of log of. 
All log of , t > have the same marginal density. 

We approach the problem as follows. Let us first introduce some aux- 
iliary notation. Write (if, . . . , for the vector ([ii/A], . . . , [i p /A]) where 
[.] denotes the floor function. We use to denote the random vectors of 
lenght p 

Xf = {Xf, . . . j = 1, . . . ,n- ^ + i? . 

Hence it's k-th. component is .« ., k = l,...,p. Analogously, 

log((X^) 2 ) denotes the vector 

io g (xf ) 2 = (logprf ) 2 , . . . ,iog(x|_. f +3 f). 

Anywhere else in the sequel, we adhere to a similar notation. Functions of a 
vector are assumed to be evaluated componentwise, yielding again a vector. 
Note that X^ is, by virtue of ([5]), approximately equal to the vector 

Xj := (o (i _ 1)A Zf , . . . , o" (i A_ i A +i _ 1)A) Z^_ i A +i ) (10) 

and that (log o 2 J _ 1)A , . . . , log o 2 A _ . A+j _ 1)A ) has density equal to f { A A ^ ^a a 
for every j, because of the assumed stationarity. Since A — > 0, one can ex- 
pect that /jA A jA A (x) /ti,...,tp( x )> This motivates us to use the observa- 
tions Xj\ or rather the log(X^) 2 , in the construction of a kernel estimator. 

The kernel w that we will use in the multivariate case is just a product 
kernel, w(x) = n?=i w ( x j)- Likewise we take k(x) = Yl^ = ik(xj) and the 
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Fourier transforms <^> w and c/>k factorize as well. Let be defined by 

(2tt)p J RP k (s//i) 

where s G M p and • denotes inner product. Notice that we also have the 
factorization v^(x) = Ilj=i v h(xj). 

We finish this section by presenting the multivariate density estimator f„/j (x) 
that we will use to estimate /(x). It is given by 

n-|+if /X -log((Xf) 2 )x 
(n — z^f + zf )n p ~ v ft / 

Note that this estimator bears some similarity to, but also differs from the 
corresponding one for a discrete time model in Van Es et al. (2005), where 
the multivariate density of (at+i, • • • , at consecutive time points is the 
object under study. 

Under the assumption that the function Vh(x) of (|8|) integrates to one, 
an estimator of fui,...,t _i)( x i> • • • > x p-l) 1S obtained by integrating out the 
variable x p in (I12p . which is of similar appearance. Further integration over 
the variables X2, ■ ■ ■ , x p -\ reduces this estimator to the estimator of the 
univariate density given by ([9]) upon the substitution of n by n — i^ + if-. 



3 Results 

To derive the asymptotic behaviour of the estimator, we need a mixing con- 
dition on the process a. For the sake of clarity, we recall the basic definitions. 
For a certain process X let J-\ be the cr-algebra of events generated by the 
random variables Xt, a < t < b. The mixing coefficient a(t) is defined by 

a(t)= sup \P(AnB) -P{A)P(B)\. (13) 

The process X is called strongly mixing if a(t) — > as t — > oo. 

As we mentioned in the introduction, it is common practice to model the 
volatility process V = o 1 as the stationary, ergodic solution of an SDE of 
the form 

dV t = b(V t )dt + a(V t )alB t . 

It is easily verified that for such processes it holds that E | Vt — Vq\ = O^t 1 / 2 ), 
provided that b £ Li(fi) and a E £2 (/•*); where fi is the invariant probability 
measure. Indeed we have E | V t - V 1 < E f*\b(V 8 )\ds + (E f* a 2 (V s ) ds) 1 / 2 = 

A\^ ) \\L 1 (ti) + I l 2 (/•*)• l n ^ ms se t u P> * ne process V is strong mixing, see for 
instance Corollary 2.1 of Genon-Catalot et al. (2000). Although we will not 
assume explicitly that a 2 solves an SDE, the above observations motivate 
the the following condition. 
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Condition 3.1. (i) The process a is L 1 -H61der continuous of order one half, 
E|af-<7§| = 0(t 1 ' 2 ) for t -> 0. 

(ii) The process a is strongly mixing with coefficient a(t) satisfying, for some 
< q < 1, 

/ a(t) q dt < oo. (14) 

Remark 3.2. Since the mixing coefficients a(t) are non-increasing in t, 
condition (|14p is equivalent to the following. For all t £ R there exists 
C(q, t) such that for all A > 

f>(fcA + t)«<£M, (15) 

fc=0 

where a(t) is set equal to 1 for i < 0. 

Our main theorems are multivariate versions of results in Van Es et al. (2003) 
which describe the asymptotic behaviour of the univariate density estimator. 
Note that it also covers the case where there is a drift bt present in equation 
(PQ). The condition on the drift is boundedness of Efo 2 . This condition is 
typically satisfied in realistic models for the log-returns of a stock, since bt 
is the local rate of return and this will be mostly bounded itself. 

Theorem 3.3. Assume that Fib 2 is bounded. Let the kernel function w 
satisfy Condition \2.1\ Let the density /^...^(x) of (log a\ log a\ ) be 
continuous, twice continuously differentiable with a bounded second deriva- 
tive and Lipschitz in ti,...,t p , uniformly in x. Assume that the first of 
Condition \3.1\ holds and that the invariant density of a 2 is bounded in a 
neighbourhood of zero. Suppose that A = n~ s for given < 5 < 1 and 
choose h = 'yn/logn, where 7 > Ap/8. Then the bias of the estimator (0|) 
satisfies 

EU(x) = / (l ,..., tp (x) + i/i 2 | u T V 2 /(x)uw(u)du + o(/ l 2 )+0(A). (16) 

Theorem 3.4. Assume that E6 2 is bounded. Let the kernel func- 
tion w satisfy Condition \2.1l Assume that Condition \3.1\ holds, that 
J |i0(it)| 2 /( 1—9 ) du < 00, where q is as in Ji^[ ), and that the invariant density 
ofo~£ is bounded in a neighbourhood of zero. Suppose that A = for given 
< 5 < 1 and choose h = 771"/ log n, where 7 > Ap/5. The variance of the 
estimator satisfies 

0(lh 2pp e pn/h ) +o(^^- 



-l+S 



VarU(x) =0(- h*"**'*) + 0{ 7J ^ WK ). (17) 



1? 



Corollary 3.5. Under the assumptions of Theorems \3.3\ and \3.4\ the 

bias satisfies 7 2 7r 2 (log n) _2 (l + o(l)) and the order of the variance is 
(log n) p ( 1+(? ) . Hence the mean squared error of the estimator f n /j,(x) is 



n 



of order (log n) 
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Proof. The choices A = n _<5 , with < 5 < 1 and h = 771"/ log n, with 
7 > Ap/5 render a variance that is of order n~ 1+p / 7 (l/ log n) 2pp for the first 
term of (117ft and ra~ 1+<5 (log n) p ^ l+q ^ for the second term. Since by assumption 
7 > Ap/5 we have I/7 < S/Ap < 5 so the second term dominates the first 
term. The order of the variance is thus n- 1+,5 (logn) p ( 1+<? ). Of course, the 
order of the bias is logarithmic, hence the bias dominates the variance and 
the mean squared error of f n h( x ) is a l so logarithmic. □ 

The proof of the theorems are deferred to the next section. We conclude the 
present section by a number of comments on the result. 

Remark 3.6. The first order bound for the variance coincides with the 
order bound for the variance of the multivariate density estimator in discrete 
time models under the assumption that the volatility process and the error 
process are independent, see Theorem 3.2 in Van Es et al. (2005). The 
second order bound is of the same nature as in the case of estimating a 
univariate density in continuous time models, see Theorem 3.1 in Van Es et 
al. (2003), the difference being that in the multivariate case of the present 
paper one has h p ( 1+q ^ in the denominator instead of h 1+q . 

Remark 3.7. We observe some features that parallel some findings for the 
univariate case. The expectation of the deconvolution estimator is equal to 
the expectation of an ordinary kernel density estimator, as becomes clear 
from the proof of Lemma 14.11 It is well-known that the variance of kernel- 
type deconvolution estimators heavily depends on the rate of decay to zero of 
|<^fc(i)| as \t\ — > 00. The faster the decay the larger the asymptotic variance. 
This follows for instance for i.i.d. observations from results in Fan (1991) and 
for stationary observations from the work of Masry (1993). The rate of decay 
of \(f>k(t)\ for the density © is given by \(f>h(t)\ = V2 (1 + 0{ A )), see 

Lemma 5.3 in Van Es et al. (2003). This shows that k is supersmooth, cf. Fan 
(1991). By the similarity of the tail of this characteristic function to the 
tail of a Cauchy characteristic function we can expect the same order of the 
mean squared error as in Cauchy deconvolution problems, where it decreases 
logarithmically in n, cf. Fan (1991) for results on i.i.d. observations. Note 
that this rate, however slow, is faster than the one for normal deconvolution. 
Fan (1991) also shows that we cannot expect anything better. 

Remark 3.8. The rate of convergence (logn) -4 for the mean squared error 
as in Corollary 3.5 has also been found for other estimators. Comte and 
Genon-Catalot (2006) use (penalized) projection estimators for /. These 
estimators are obtained by computing certain projections on large but fi- 
nite dimensional subspaces of L 2 (R). Under similar assumptions as ours, 
they also find the rate of convergence (log n) -4 . By sharpening the assumed 
smoothness properties of /, i.e. fast enough exponential decay of the char- 
acteristic function of /, so that / itself is a supersmooth density, they were 
able to obtain rates that are even negative powers of n. 
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Van Zanten and Zareba (2008) consider wavelet estimators of the den- 
sity of the accumulated squared volatility over intervals of length A with A 
fixed for the model without drift and with the same observations scheme. 
Under similar conditions, they found this rate for the supremum of the mean 
integrated squared error, the supremum taken over densities in some Sobolev 
ball. For densities satisfying stronger smoothness conditions, their estima- 
tors they obtained better rates, albeit still negative powers of logra. Both 
papers deal with estimating a univariate density only. 

Franke, Hardle and KreiB (2003) consider a discrete time model, where 
the evolution of log at is decribed by a nonlinear autoregression. By adopt- 
ing a deconvolution approach they estimate the unknown regression function 
and establish tightness of the normalized estimators, where the normaliza- 
tion again corresponds to the rate that we found. 

Remark 3.9. Better bounds on the asymptotic variance can be obtained 
under stronger mixing conditions. Consider for instance uniform mixing. In 
this case the mixing coefficient 4>(t) is defined for t > as 

4>{t)= sup \P{A\B) - P(A)\ (18) 

and a process is called uniform mixing if <p{t) — > for t — > oo. Obviously, 
uniform mixing implies strong mixing. As a matter of fact, one has the 
relation 

a(t) < fa(t). 

See Doukhan (1994) for this inequality and many other mixing properties. 
If a is uniform mixing with coefficient <f> satisfying (p(t) l l 2 dt < oo, then 
the variance bound is given by 

Va r /„,, W = (I ft *e^.) +0 (_L_). (19 ) 

The proof of this bound runs similarly to the strong-mixing bound. The 
essential difference is that in equation (I55p we use Theorem 17.2.3 of Ibrag- 
imov and Linnik (1971) with r = instead of Deo's (1973) lemma, as in the 
proof of Theorem 2 in Masry (1983). 

4 Proof of the Theorems 

We give the proof under the additional assumption that bt = 0. The general 
case is an easy consequence. 

Let T a denote the sigma field generated by the process a. For j = 
1, . . . , n — ip + ii we introduce, along with the Xj of (fTUj) . the following 
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vector notation 



(^(j-l)A,---0-(;A_;A +j _ 1)A ) 



7 4 ~ ( Z f '---^l-if+i)' 



so that Xj equals the Hadamard product <jj o Z^. Note that since the 
a process is defined on the whole real line the a vectors are actually well 
defined for all j. 

Let f n h denote the estimator based on the approximating random vectors 
Xj, i.e. 



U(x) = (n- ? A + Z A), P E A ~ h )• (20) 



The proof of (|16p is partly based on the following two lemmas, whose proofs 
are given in the next section. The first one deals with the expectation of 

Lemma 4.1. Let the density fti,...,t P ( x ) °f (log* 7 ?!) • • • i^S^tp) be Lipschitz 
in t\, . . . ,t p , uniformly in x. Then 

Ef nh (x) = l J--- J w(^)/ iWp (u)du + 0(A) (21) 

Notice that, apart from the 0(A) term, the equality (|2ip is the same as 
for ordinary multivariate kernel estimators, see for instance Haxdle (1990) 
and Scott (1992). 

The second lemma estimates the expected difference between f n h and f n h- 
The bound is in terms of the functions 

7o(>0 = ^ / XTTTIT ds (22) 

and 

1 ■*■/), 1 Z 71 " 1 + k/\x\ \ , l + 7r/|x| . . 

-Uih, x) = eW* + _ exp ( - —JU) log —JU. (23) 

Lemma 4.2. Assume Condition \2.1\ and that the first of Condition \3. 1\ holds 
and that the invariant density of of is bounded in a neighbourhood of zero. 
For h — > and e small enough we have 

|EU(x)-EU(x)| = 

(«*r + l P 7o(M p ^ + j^vcr^h, | log 2s| A)- 



/: 


<i>w(s) 




<Pk(s/h) 


^7T 1 + 7r/ X 


V2 


h 
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Proof of Theorem 13.31 Statement (|16p follows by combining standard ar- 
guments of kernel density estimation applied to expression (|2ip in Lemma [4,ll 
with Lemma 14.21 We will now show that the bound in Lemma 14.21 is essen- 
tially a negative power of n, whereas h? is of logarithmic order. Recall that 
we have assumed 5 > 4p/^f. It follows that p/2*y < 5/ A — p/2j, so we can 
pick a P £ (p/27,5/4 — p/27) and take e = n~@. By Lemmas 15.11 and 15. 3( 
up to factors that are logarithmic in n, the order of |Ef n /j(x) — Ef n / l (x)| is 
then 

n^-jt+f 3 + n ^+ 2 M +n ^ _/3 , (24) 

which is negligible to h? = 7 2 7r 2 /(log n) 2 for the chosen values of the param- 
eters. □ 

To prove the bound (|17p we use the two lemmas below, which are proved in 
the next section. First consider the variance of f n ^(x). 

Lemma 4.3. Assume Condition \2.1\ and assume the second of Condi- 
tion \3.1[ Assume also J |w(«)| 2 ^ 1_<? ^ du < 00 for the same q and nA — » 00. 
We have, for h — > 0, 

VarUM = 0(i**V"») +0(^5^). (25) 

The next lemma estimates Var (f n h(x) — f n h(%))- 

Lemma 4.4. Assume that Condition \2.1\ and Condition \3.1\ hold and let <r 2 
have a bounded density in a neighbourhood of zero. We have, for h — > and 
e > small enough, 

Var (f n/l (x) - fnft(x)) = 

°(^ 7o(/l)2P ^? + ^ 7o(/l)2P " 27l(/l ' 1 l0g2e|//l) ' 



log 2e| 2 



(26) 

+ rr^of— (27) 



n/i 2 PA V /i 2 e 2 

Remark 4.5. For p = 1, the order bounds of Lemma 14.41 reduce to those 
of Lemma 4.3 in Van Es et al. (2003). 

Proof of Theorem 13.41 The bound of (|17p follows as soon as we show 
that the estimate in Lemma [4.41 is of lower order than the one in Lemma [4.31 
Up to terms that are logarithmic in n, the bound in Lemma 14.31 is of order 
n 5_1 . Choosing again e = n' 13 , by Lemmas 15.11 and 15.31 one finds that, up 
to logarithmic factors, the order of Var (f n /j(x) — f n /i(x)) is 

n -l+*-f + n -l+*-/J + n -l+2/3 + ^ + „- (28) 

Recall our assumption Sj > 4p. If we pick f3 less than \ 8(1 — q), then all 
these terms are indeed of lower order than n 5 ^ 1 . □ 
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5 Proof of Lemmas 14.1 



4.4 



We need expansions and order estimates for the functions the kernel 
as defined in ([HJ , 70 as defined in (f22l) and the function 71 as defined in (f23l) . 
These are collected in the next technical lemmas, that are partially taken 
from Van Es et al. (2003) and Van Es et al. (2005). 

Lemma 5.1. Assume Condition \2.1{ For h — > we have 

l0 (h) =o(h 1+p e^ h y (29) 
Proof. See the proof of Lemma 5.3 in Van Es et al. (2003). □ 



Lemma 5.2. Assume Condition \2.1[ The functions Vh andv^ are bounded 
and Lipschitz. More precisely, for all x we have \vh(x)\ < Jo(h) and for all 
x and u \vh{x + u) — %(x)| < Jo(h) \u\. For all p vectors x we have 

v fc (x)| < 7o(/i) p (30) 

and for all p vectors x and u 

p 

K(x + u) - v h (x)| < 7o(/i) p Yl N ( 31 ) 

i=i 

and for some C > 0, 

p 

|w(x + u) - w(x)| < Cj2\uj\. (32) 

i=i 

Proof. The results for are known from Lemma 5.4 in Van Es et 

al. (2003). The bound (|30|) follows by the product structure of v^. Inequal- 
ity (f3Tj) follows by induction and the same techniques can be used to prove 
inequality (f32l) . □ 



Lemma 5.3. Assume Condition \2.1\ For x — > oo we have the following 
estimate on the behavior of v^. For some positive constant D it holds that 

\vh(%)\ < £> 7l j as \x\ — > oo, (33) 

and 

7l(M)=0 ^^ e Hi+VM)/^ ash ^ . (34) 

Moreover, we have the following estimate on the behavior of . For some 
positive constant D it holds that, if the absolute value at least one of the 
components o/x tends to infinity, 

K(x)|< DToW -' -"f J v ' V ' v - V ^ . (35) 

\x\ V . . . V x p \ 
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Proof. The estimates of (I33D and (J34J) are taken from Lemma 5,5 of Van Es 
et al. (2003). To show (|35p . we argue as follows. Let x* = maxxi, . . . ,x p . 
Without loss of generality we may assume that x* = x p . Use the bound on 
7o of Lemma E2] and the bound in ([33]) to get | v/j(x) | = YVi.=i v h{%i)vh(x p ) < 
D 70 (h)P~ 1 7l (h,x p )/x p = D l0 (h)P- 1 11 (h,x*)/x*. ' D 

We are now ready with the proof of Lemma 14.11 Recall that T a is the 
cr-algebra generated by the process a. 



"~ ipA+if - 'X-logo-s -logfZ^ 2 



Proof of Lemma 14.11 Write 



A + i?)h p ^ W p J J <hc(B/h) VkV 1 ' 



(n-i£ + i?)hP ^ (2tt) 

(n - ifi + i<t)ti? pi V /i 

By taking the expectation we get, using |(z A — 1)A — tj\ < 2 A, for j 
1, . . . ,p, and the uniform Lipschitz continuity of / 

Ef nh (x) = EE (U(x)|^) =EiEw^ X " l0g<T ° 2 



hP V h 

If f /x — u 

hp 



J . . .J w(^^)/ (f A_ 1)Aj .„ )( ^_ 1)A (u)du 

~Tp j "' j w (^X^) (/ (^- 1 ) A '---'^- 1 ) A(u)_/ ' 1 '---^ (u))du 

^/"7 w (^) /4i "-' tp(u)du+o(A) - 



□ 

For the proof of Lemma [4. 21 we recall, see Equations (30) and (31) in Van Es 



14 



et al. (2003), a few properties of the process cr, valid under Condition 13.11 
There exists a constant C > such that 



and 



E 



E (Xf - a Z^) 2 < CA 1 / 2 for A -» 0, 
< CA 1/2 for A -» 0. 



of * - oo 



(36) 
(37) 



Proof of Lemma 14.21 We follow the lines of thought as in the proof of 
Lemma 4.2 of Van Es et al. (2003), now applied in a multivariate setting. 
Let || • || denote the Euclidean norm. Writing 



W,- 



x - io g ((xf ) 2 ; 

h 
i 



V/, 



n—i-\-i 

so that f n/l (x) - f n fc(x) = (n _.A +j A )feP J2j=i 1 w i> and defining the event 

A as the event that all components of |X^| and |Xi| are larger or equal to 
e, we have 



x - log(X 2 ) 



(38) 



|EU(x)-EU(x)| < 



— E|Wi| 

hP 1 1 



+-E|W 1 |/ Ac / [||xf _x l|| > £ 



+-E|W 1 |/ Ac / [||xf _x i| 



(39) 
(40) 
(41) 
(42) 



Recall that |logx — logy| < \x — y\/e for x,y > e. By Lemma [5.21 the 
bound (|36l) and stationarity, the term (jlOj) can be bounded by 



hp- 



- 10 (hrY,v\io g (\x*\) - iog(\x lf \)\i A 



< 



2j> 1 7 o(M P E|Xf-l 1 |< I ^ T7o (/ l )^^. 



This gives the first term in the order bound of Lemma 14.21 

The boundedness of the function as stated in Lemma 15.21 yields 
|wi| < 27o(/i) p . Using also Chebychev's inequality and ([36]) . we bound the 
term (jUJ by 



2 

7^ 



i- 10 (hyp(\\xf 



Xi|| > e) < ^~fo(h) p pP(\Xf - X,\ > -|) 
2p 2 A 1 / 2 
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which gives the second term in order bound of Lemma 14,21 

Consider the two arguments of the functions in Wj. Since at 
least one of them (and then the same for both arguments) is in absolute 
value eventually larger than | log2e|//i, by Lemma 15.31 the term (|42p can be 
bounded by 

2 ^ 7o( , !rSl(ftJlog2£ | /A )_i_ pP( |.t 1 i < 2£) < 
ft^ocr 1 7i (h, i log fei/fcjp^, 

for some constant C2, where we used in the last inequality the fact that the 
density of X\ is bounded. This follows from the assumption that <Tq has a 
bounded density in a neighbourhood of zero, as can easily be verified. □ 

Proof of Lemma 14.31 Consider the decomposition 

Var (U(x)) = Var (E (U(x)|^)) + E (Var (U(x)|^)). (43) 

By the proof of Lemma 14.11 the conditional expectation E {fnh(p s )\^'er) is 
equal to a multivariate kernel estimator of the density of log n\ . Adapting 
the proof of Theorem 3 of Masry (1983) to the multivariate situation, we 
can bound its variance by 

20(1+0(1)) r 



n 



fix) 1 -"^ / ^(tiJI^-^du) q j a(T) q dT 



which is of the order 0(l/(nh^ 1+q ^A)). This gives the second order bound 
in ((25|>. 

We turn to the expectation of the conditional variance. Using 
Lemma 1 5. 2 ( we can bound the 'diagonal terms' of the conditional variance 
in flMD by 

1 EU(^?^)) 2 = o(' lo{ hf* 



[n 



£ + if)h 2 P V ft V h J J \nh 2 P 



where we also used that i^/n — ► 0. 

Next we consider the 'cross terms' of the conditional variance. Since 
nonzero covariance can only occur if the vectors Xj and Xj have common 
elements, we investigate a 'worst case'. For fixed i, there are at most p — 1 
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among the xj that have elements in common with x.;, which yields 
1 / /x-logXK ^ x-logX 2 ^ s 

lz F3 



^ ( B - 2 ^'f)^ W * = (^ W *)- 

where in the last inequality we used that the expectation of the condi- 
tional covariance is bounded in absolute value by E ^v^ ( x 1( ^ g Xl ^ , due 

to stationarity. The first order bound in (|25D follows by an application of 
Lemma 15.11 □ 

Proof of Lemma 14.41 We will use arguments similar to those in the proof 
of Lemma 14.21 With Wj as in (|38|) we have, using the ordinary variance 
decomposition and stationarity of the Wj, 

Var(f n/l (x) - f„h(x)) = 

7 ■Ai-Au2 P VarWl + 7 -A I -A^2 P E CoV ( W «' W j)- < 44 ) 

(n — + if-)n z P (n — i£ + if L ) z n z P e —^. 

Let us first derive a bound on VarWi. As in the proof of Lemma l4.2l we use 
A, the event that all components of |X^| and |Xi| are larger than or equal 
to e. We have VarWi < EWj, which can be split up as the three terms 
sum 

EW?=EW^ (45) 
+ EW?J A cI [||X A_ il|| > E] (46) 

+ EWV [ ||xf-x 1 ||<e]- ( 4 7) 

By stationarity, the Lipschitz property of in Lemma 15.21 and ([36]) the 
term (j4"5|) can be bounded by 

4 . * . ,„ Al . lA .A3. 



^ 7o(/i) 2p E ( 53 | log |X| | - log |X-a 1 1 j /a 



3=1 

V 



< ^ 7 o(M 2p E^(log|X||-log|X lf |) 2 / A 

< ^^(^Edxfi-i^i) 2 



< ^^(^(Xf-lO^^oW 2 ^^. (48) 
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We turn to the term (|46f) . By the bound on of Lemma 15.21 and by ([36]) 
again, it can be bounded by 

4 l0 (h) 2 vp(\\X? - ± l \\ > e) < 4 70 (h) 2p pP(\X* - X,\ > ^-) 

Al/2 

< 4p2 lQ (h) 2 PC—. 

Due to absence of a factor h 2 in the denominator, this bound is of smaller 
order than the one for (|45|) and will therefore be neglected. 

Next we consider ()47p . Recall form the proof of Lemma 14.21 that 
P i\X\ | < 2e) = 0(e). Since at least one (the same) coordinate of the 
absolute value of both arguments of is eventually larger than | log2e|//i, 
by Lemma 15.31 the term (|47p can be bounded by 

4D 2 10 (h) 2p ~ 2 7i (/i, I log 2e\/h) 2 log2 * |2//t2) pP(\Xi\ < 2s) < 

C 2 / l 2 7o(/ i ) 2p - 2 7i(^|log2 £ |// l ) 2 ^-w> (49) 

| log ley 

for some constant C 2 . 

Wrapping up the order bounds ([4*8]) and (f4"9|) for E W 2 , we get 

EW2 = (^7o(/0 2p ^+^7o^ (50) 

which, substituted in ("4*4"]) . gives the order bounds of ([26]) . 

We now consider the covariance terms in (|44p . that will be seen to have 
the order bounds of (|27|) , We have the decomposition 



Cav(Wi,Wj) =ECov(W i ,W i |^ T ) + Cov(E(W i |^ T ),E(W i |^ T )). (51) 
The last term in (I44p then becomes 



,„-,^ E ? ECovfW.W,,^) (52, 

P 1 4 = 1 j^j 



-^Cov(E(W^),E(W i |JP T )). (53) 



In a first step we consider the expectation of the conditional covariances 
in (|52|) , Arguing as in the proof of Lemma 14.3] we can bound it by 

{P ~ 1) -VarWx, 



(n - i£ + if)h 2 P 

which is p — 1 times the first term on the right hand side of Equation 
Hence its contribution can be absorbed in the already obtained bounds 
of (1261). 
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Next we concentrate on the sum of covariances in (1531). Define 



1 



A 



Oi = - I a 2 t dt (54) 



(t-l)A 



and the vector &j by aj = (a i A + j_ 1 , . . . , a i A + j_ 1 ). Note that given T a , 
is a multivariate normal vector with independent components with variances 
equal to the components of <Xj and that Xj is a multivariate normal vector 
with independent components with variances equal to the components of 
cr?_ 1 . As in the proof of Lemma 14.11 it follows that 

E (Wi|J>) = w - w' 



We follow the line of arguments in the proof of Theorem 3 in Masry 
(1983). The stationarity of Wj implies that also the conditional expecta- 
tions Wj := E (Wj| J- a) are stationary. Hence we have 

n-l 

^Cov(W i ,W i ) = 2^(n-A;)Co V (Wo,W ifc ). 

ij^j k=l 

Now note that the process Wj is strongly mixing with a mixing coefficient 
a(k) < a((k - 2)A + t t - t p ), k = l,2,... if kA > t p - h + 2A and a(k) = 1 
else. By a lemma of Deo (1973) for strongly mixing processes it follows that 
for all r > 

|Cov(W ,W fc )| < 10a((A:-2)A + ti-t p ) r /( 2+T )fE|Wi| 2+rN ) 2/(2+r) . (55) 



By the equivalent Condition (|15|) on the mixing coefficients a(t) (applied 
with r = 2q/(l — q), a choice for r that we will make later on as well), we 
get for (|S3J) 



< - a~~ A\ 5 (ElWiM 87 ^ V(l * )a(*A + t, 

- („. _ ,;A i ,;AU2 P 1 7 n _ ,;A . ,;A ^ V ^ J 



< 10 C(^, tl -t p -2A) ( 2+r \2/(2+r) 

" (n-i^ + if)^ A V 1 11 / 

Next we derive a bound on E|Wi| 2+T . Fix k £ (0, 1] and define the event 
-B as the event that all components of |<xi| and |oq| are larger or equal to e. 
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We have 



E|Wi| 2+T = E 



w 



X - log((Tl) 



h 



w 



x-log(<r 2 ) 



2+r 



+ E 
+ E 



w 



w 



x - log(cri) 
h 

x - log(o-i) 
h 



w 



w 



x - log(crg) 
h 

x - log(ffg) 



2+t 



Ib 



rf-<#l>e] 



2+r 



Ib c I\ 



[|kr-o-o K ll< £ 



(56) 
(57) 
.(58) 



By Lemma 15.21 the term (|56p can be bounded by a constant times 



^Er^iiog^^-iog^^i) 2 ^ 



l + T P 



h 2 + 



i=i 



< ^E I lGg(^A) - l0g(^ f _ 1 )| 2+ -/ B 



2+r 



< 



{K€h) 2 + 

The term (ISTl) can be bounded by 



„2+r 

^ T?l^« 2ki2+t 

-E lo"! — cr I 



(59) 



,2+r/2 



2k 1 2+t 



P p(K - ^1 > -|) < ^+7- E K - < 

Since this is for h — > of smaller order than (|59p . it will be neglected in the 
sequel. 

Finally we analyze the term (|58p . On the complement of -B there is 
at least one component of either |cti| or \a 2 \ that is smaller or equal to 
e. Together with \\ir* — <t 2k || < e this implies that there is at least one 
pair of corresponding components of the vectors that are both smaller than 
e(l + e 1-K ) 1 / K . Using the stationarity, we bound the term (I58p by 

pP(jti < e(l + £ l - K ) 1/K and a 2 < e(l + e^) 1 ^), 

which is bounded by 



P P(a 2 < 2e) = O(e), 



(60) 



since a 2 was assumed to have a bounded density in a neighbourhood of zero. 
Combining (|59p and (|60|) with r = 2q/(l — q) and re = ^7 = ^5^, we have 
with an application of the basic inequality \u K — v K \ < \u — v\ K for u,v > 
and At S (0, 1] in the second equality below and ([37]) in the fourth equality 
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for the term (153 
1 



(n-i£+if) 2 h 2 P ^ ' ' "" " J 



O 



{n-i£ + if)h 2 PA U 2 + T e 2 



1 1 



O 



(n-i£ + if)h 2 PA V/j2+r £ 2+ 



1 1 



El - k _2ki2+t i _ 



2/(2+r) 



2/(2+r) 



1 



(n - + if )h 2 PA 
1 



(E^-ao 2 !) 2 /^) 



(n - i£ + if )h 2 PA V /i 2 e 2 



/i 2 e 2 
A i/(2+r) 

+ e 



+ e 



2/(2+r) 



2/(2+r) 



-o 



A( 1 -'?)/ 2 



1-9 



(n - i£ + if V h 2 e 2 

Hence the last term in (|44p now gives the third order bound ([2 
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